1
From Eager Operators to Block-Based Parallelism
AI023 Lesson 3
00:00

Transitioning from PyTorch Eager Mode to Triton requires a shift from viewing tensors as monolithic objects to viewing them as collections of discrete, manageable blocks or tiles.

1. PyTorch vs. Triton Tensors

It is vital to distinguish Triton tensors from PyTorch tensors. A PyTorch tensor is a host-side Python object wrapping shape, dtype, device, strides, and storage metadata. In contrast, Triton works with the raw data pointers within specific memory blocks, allowing for much lower-level optimization.

2. The Eager Bottleneck

In standard eager execution, every operation (e.g., Addition then ReLU) requires a separate kernel launch and a global memory round-trip. This is the primary bottleneck in modern GPU computing. Triton overcomes this by fusing operations within a single kernel that processes blocks of data (e.g., 128, 256, or 512 elements) directly in on-chip memory.

3. The Block-Based Paradigm

Instead of the scalar-level thinking of CUDA threads, Triton uses SPMD (Single Program, Multiple Data) at the block level. You write one kernel, and Triton launches multiple instances across a grid. Each instance uses its program_id to calculate which "chunk" of memory it owns.

PyTorch Tensor[Metadata Wrapper]Block 0 (pid 0)Block 1 (pid 1)Block 2 (pid 2)

4. Environment Setup

To begin, install Triton in a clean environment (using Conda or venv) to ensure no dependency conflicts with existing CUDA toolkits: pip install triton.

main.py
TERMINAL bash — 80x24
> Ready. Click "Run" to execute.
>